Skip to content

Support deserializing dictionary encoded arrays with null values#308

Closed
bennetthardwick wants to merge 1 commit into
chmp:mainfrom
foxglove:main
Closed

Support deserializing dictionary encoded arrays with null values#308
bennetthardwick wants to merge 1 commit into
chmp:mainfrom
foxglove:main

Conversation

@bennetthardwick
Copy link
Copy Markdown
Contributor

Arrow allows dictionary encoded arrays to have null values. These values are not specified as null in the validity bitmap but are known as "logical nulls" and should still be supported.

Currently serde_arrow will return an error when attempting to deserialize a dictionary encoded array with null values.

This change adds support for them and allows them to deserialize.

Fixes #307.

Arrow allows dictionary encoded arrays to have null values. These values
are not specified as null in the validity bitmap but are known as
"logical nulls" and should still be supported.

Currently serde_arrow will return an error when attempting to
deserialize a dictionary encoded array with null values.

This change adds support for them and allows them to deserialize.

Fixes chmp#307.

#[test]
fn some_null_values_v2() {
fn some_null_values_out_of_range() {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this test is really required anymore. It seems to be testing that if the dictionary contains a null but isn't used, it still errors out.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. Not sure what I was thinking here. I would probably remove the test.

Copy link
Copy Markdown
Owner

@chmp chmp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really cool. Thanks a lot.

Can you maybe expand the test coverage slightly and update the Changes.md?


#[test]
fn some_null_values_v2() {
fn some_null_values_out_of_range() {
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. Not sure what I was thinking here. I would probably remove the test.

Comment thread serde_arrow/src/test_with_arrow/impls/arrow_dictionary.rs

#[test]
fn all_null_values() {
let array = Array::Dictionary(DictionaryArray {
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add a test with nullable keys? (which is supported by your impl and the format, if I read it correctly)

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't seem to push into your branch, otherwise I would've added the test myself, that's what I would've done:

    #[test]
    fn some_null_keys() {
        let array = Array::Dictionary(DictionaryArray {
            keys: Box::new(to_array(DataType::Int8, true, [Some(1), None, Some(0)])),
            values: Box::new(to_array(DataType::Utf8, false, ["foo", "bar"])),
        });

        let deserializer =
            ArrayDeserializer::new(String::from("$"), None, array.as_view()).unwrap();

        assert_eq!(
            Option::<String>::deserialize(deserializer.at(0)).unwrap(),
            Some(String::from("bar")),
        );
        assert_eq!(
            Option::<String>::deserialize(deserializer.at(1)).unwrap(),
            None
        );
        assert_eq!(
            Option::<String>::deserialize(deserializer.at(2)).unwrap(),
            Some(String::from("foo")),
        );
    }

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I'm going to re-open this from a personal fork I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support null values in dictionary arrays

2 participants